Video object detection still faces several difficulties and challenges. For example, the imbalance of positive and negative samples leads to low information processing efficiency, and detection performance declines in abnormal situations in video. This paper examines video object detection based on local attention to address such challenges. We propose a local attention sequence model and optimized the parameter and calculation of ConvGRU. It could process spatial and temporal information in videos more efficiently and ultimately improve detection performance under abnormal conditions. The experiments on ImageNet VID show that our method could improve the detection accuracy by 5.3%, and the visualization results show that the method is adaptive to different abnormal conditions, thereby improving the reliability of video object detection.
Loading....